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Abstract 

Most  previous  visual  recognition  systems  simply  assume 
ideal  inputs  without  real-world  degradations,  such  as  low 
resolution,  motion  blur  and  out-of-focus  blur.  In  presence 
of  such  unknown  degradations,  the  conventional  approach 
first  resorts  to  blind  image  restoration  and  then  feeds  the 
restored  image  into  a  classifier.  Treating  restoration  and 
recognition  separately,  such  a  straightforward  approach, 
however,  suffers  greatly  from  the  defective  output  of  the  ill- 
posed  blind  image  restoration.  In  this  paper,  we  present  a 
joint  blind  image  restoration  and  recognition  method  based 
on  the  sparse  representation  prior  to  handle  the  challeng¬ 
ing  problem  of  face  recognition  from  low-quality  images, 
where  the  degradation  model  is  realistic  and  totally  un¬ 
known.  The  sparse  representation  prior  states  that  the  de¬ 
graded  input  image,  if  correctly  restored,  will  have  a  good 
sparse  representation  in  terms  of  the  training  set,  which  in¬ 
dicates  the  identity  of  the  test  image.  The  proposed  algo¬ 
rithm  achieves  simultaneous  restoration  and  recognition  by 
iteratively  solving  the  blind  image  restoration  in  pursuit  of 
the  sparest  representation  for  recognition.  Based  on  such  a 
sparse  representation  prior,  we  demonstrate  that  the  image 
restoration  task  and  the  recognition  task  can  benefit  greatly 
from  each  other.  Extensive  experiments  on  face  datasets  un¬ 
der  various  degradations  are  carried  out  and  the  results  of 
our  joint  model  shows  significant  improvements  over  con¬ 
ventional  methods  of  treating  the  two  tasks  independently. 

1.  Introduction 

In  many  real  world  applications,  such  as  video  surveil¬ 
lance,  the  target  of  interest  in  the  captured  image  usually 
suffers  from  low  qualities,  such  as  low  resolution  due  to  the 
long  distance  of  the  target,  motion  blur  due  to  the  relative 
motion  between  the  target  and  the  camera,  and  out-of-focus 
blur  if  the  the  target  is  not  in  the  focus  of  the  capture  de- 
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Figure  1.  Sparse  Representation  based  Joint  Blind  Restoration  and 
Recognition  (JRR)  framework.  Given  a  blurry  observation,  JRR 
iteratively  estimates  the  PSF  and  the  underlying  identity  based  on 
the  sparse  representation  prior.  The  algorithm  will  output  the  esti¬ 
mated  PSF,  a  deblurred  image,  and  the  identity  of  the  observation. 

vice,  or  even  some  complex  combinations  of  these  factors. 
In  such  practical  scenarios,  it  will  present  a  big  challenge  to 
perform  many  high  level  vision  tasks  such  as  recognition. 

A  natural  solution  to  this  problem  would  be  to  first  per¬ 
form  image  restorations  to  obtain  an  image  with  better  qual¬ 
ity  [6,  14,  19],  and  then  feed  the  restored  result  into  a  recog¬ 
nition  system.  Such  a  straightforward  approach  has  the 
problem  that  many  restoration  algorithms  are  designed  for 
improving  human  visual  perception  only,  rather  than  ma¬ 
chine  perception,  thus  there  is  no  guarantee  of  recognition 
improvements.  Even  worse,  when  the  degradation  model 
is  unknown,  general  purpose  restoration  schemes,  such  as 
deblurring,  do  not  perform  well  on  some  realistic  images 
that  do  not  exhibit  strong  edge  structures,  such  as  faces,  and 
will  typically  introduce  severe  artifacts  that  actually  deteri¬ 
orate  the  recognition  performance.  Instead  of  restoring  the 
test  image,  another  approach  could  be  to  estimate  the  degra¬ 
dation  model  first,  use  it  to  transform  the  training  images, 
and  then  compare  the  input  test  image  with  the  synthetically 
generated  training  set.  This  method  generally  works  better 
than  the  previous  one.  But  for  many  realistic  data  whose 
degradation  model  is  very  complex,  it  may  easily  fail. 

Image  deblurring  is  a  long-standing  restoration  prob- 


lem  in  image  processing  and  computer  vision  communi¬ 
ties  [6,  14].  Recent  works  have  shown  that  it  is  possi¬ 
ble  to  estimate  both  the  blur  kernel  and  the  latent  sharp 
image  with  high  quality  from  a  single  blurry  observation 
[6,  14,  11,  1,  2].  However,  these  methods  rely  on  the  key 
assumption  of  the  existence  of  strong  edge  structures  in 
the  latent  image,  which  facilitates  the  algorithms  to  find 
a  meaningful  local  minimum  [11].  In  situations  of  few 
strong  edge  structures,  e.g .,  face  images,  these  methods  may 
fail.  Although  much  progress  has  been  made  on  pure  im¬ 
age  restoration,  only  few  works  have  studied  the  impacts 
of  restoration  on  recognition,  or  vice  versa,  the  effects  of 
recognition  on  restoration.  The  method  in  [3]  alternated  be¬ 
tween  recognition  and  restoration  to  change  the  patch  sam¬ 
pling  prior  using  non-parametric  belief  propagation  for  digit 
recognition,  with  the  assumption  of  a  known  image  blur 
model.  Hennings- Yeomans  et  al.  [9]  proposed  a  method 
to  extract  features  from  both  the  low-resolution  faces  and 
their  super-resolved  ones  within  a  single  energy  minimiza¬ 
tion  framework.  Nishiyama  et  al.  [12]  proposed  to  improve 
the  recognition  of  blurry  faces  with  a  pre-defined  finite  set 
of  Point  Spread  Function  (PSF)  kernels.  However,  these 
methods  only  deal  with  some  simple  image  degradations. 

We  present  in  this  paper  a  Joint  image  Restoration  and 
Recognition  (JRR)  approach  based  on  the  sparse  representa¬ 
tion  prior  for  face  images,  to  handle  the  challenging  task  of 
face  recognition  from  low-quality  images  in  a  blind  setting, 
i.e.,  with  no  a  priori  knowledge  on  the  blur  kernels,  which 
can  be  non-parametric  and  very  complex  (Figure  1).  We  as¬ 
sume  that  we  have  sharp  and  clean  training  face  images  for 
all  the  test  subjects,  and  the  degraded  test  image,  if  correctly 
restored,  can  be  well  represented  as  a  linear  combination  of 
the  training  faces  from  the  same  subject  up  to  some  sparse 
errors,  thus  leading  to  a  sparse  representation  in  terms  of  all 
the  training  faces.  When  the  test  subject  is  not  present  in  the 
gallery,  it  will  violate  our  sparse  representation  assumption, 
and  in  principle  the  test  subject  can  be  rejected  via  a  similar 
approach  as  in  [18],  which  is  not  considered  in  the  current 
paper.  With  such  a  sparse  representation  prior,  the  proposed 
method  connects  restoration  and  recognition  in  a  unified 
framework  by  seeking  sparse  representations  over  the  train¬ 
ing  faces  via  ^i-norm  minimization.  On  one  hand,  a  better 
restored  image  can  be  better  represented  by  the  images  from 
the  same  class,  leading  to  a  sparser  representation  in  terms 
of  the  training  set,  thus  facilitating  recognition;  on  the  other 
hand,  a  better  resolved  sparse  representation,  which  implies 
better  recognition  ability,  can  give  a  more  meaningful  reg¬ 
ularization  in  the  solution  space  for  blind  restoration.  Our 
approach  iteratively  restores  the  input  image  by  searching 
for  the  sparsest  representation,  which  can  correct  the  initial 
possibly  erroneous  recognition  decision  and  recognize  the 
person’s  identity  with  increasing  confidence. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2 


summarizes  the  role  of  sparsity  in  both  image  restoration 
and  recognition.  Section  3  proposes  our  JRR  framework 
and  presents  an  efficient  optimization  procedure  to  find  the 
solution.  Experiments  on  face  datasets  under  various  real¬ 
istic  degradations  are  carried  out  in  Section  4.  Finally,  we 
make  some  discussions  and  conclude  the  paper  in  Section  5. 

2.  Sparse  Representation  in  Restoration  and 
Recognition 

In  this  section,  we  briefly  introduce  the  basics  of  sparse 
representation  and  summarize  its  applications  on  both  ill- 
posed  inverse  image  restoration  and  pattern  recognition. 

2.1.  Sparse  Representation 

Sparse  representation  modeling  of  data  assumes  an  abil¬ 
ity  to  describe  signals  as  a  linear  combination  of  a  few 
atoms  from  a  pre- specified  dictionary.  Formally,  given  a 
signal  x  G  Mm  and  a  dictionary  D  =  [di,  d2,  •  •  •  ,  dn]  G 
Mmxn,  where  typically  m  <  n,  we  can  recover  a  sparse 
representation  (e  =  0)  or  sparse  approximation  (e  >  0)  a 
for  x  by: 


min  ||a||0 

OL 

s.t.  ||x  —  Da||2  <  e- 


(1) 


The  model  tries  to  seek  the  most  compact  representation 
for  the  signal  x  given  the  dictionary  D,  which  can  be  or¬ 
thogonal  basis  (m  =  n),  over-complete  basis  (jn  <  n)  [1] 
or  dictionary  learned  from  the  training  data  [19].  For  or¬ 
thonormal  basis,  solution  to  (1)  is  merely  the  inner  prod¬ 
ucts  of  the  signal  with  the  basis.  However,  for  general  dic¬ 
tionary  (non-orthogonal  and  over-complete),  the  optimiza¬ 
tion  for  (1)  is  combinatorially  NP-hard.  Recent  works  show 
that,  this  NP-hard  problem  can  be  tackled  by  replacing  the 
non-convex  ^o~norm  with  ^i-norm  under  some  mild  condi¬ 
tions  [  ],  which  makes  the  objective  function  convex  while 
exact  solution  can  still  be  guaranteed.  Using  the  Lagrange 
multiplier,  we  can  reformulate  the  relaxed  -problem  as 

a  =axgmin||Da  -  x||2  +  A||a||i.  (2) 

OL  V  7 


Sparsity  plays  an  important  or  even  crucial  role  in  many 
fields,  such  as  image  restoration  [5,  1,  19],  compressive 
sensing,  and  recognition  [17,  18].  In  the  following,  we  will 
make  a  brief  discussion  on  the  role  of  sparsity  in  both  image 
restoration  and  pattern  recognition. 

2.2.  Sparsity  in  Image  Restoration 

A  close  inspection  of  the  progress  made  in  the  field  of 
image  processing  in  the  past  decades  reveals  that  much  of 
it  can  be  attributed  to  better  modeling  of  the  image  con¬ 
tent  [5].  Sparsity  is  arguably  the  most  widely  used  prior 


for  image  restoration,  such  as  image  denoising,  inpaint¬ 
ing,  super-resolution  and  deblurring  [5].  Among  these,  we 
specifically  focus  the  discussion  on  image  deblurring. 

Image  blurring  is  a  widely  existing  degradation  factor  in 
the  real  life  imaging  process  ( e.g .,  surveillance),  possibly 
resulting  from  defocusing,  relative  motion  between  the  ob¬ 
ject  and  the  camera,  to  name  a  few  [6,  14],  which  may  bring 
severe  adverse  impacts  on  both  human  perception  and  ma¬ 
chine  perception  (e.g.,  classification).  Assuming  convolu¬ 
tional  blur  model  and  additive  white  Gaussian  noise,  the  low 
quality  image  observation  process  can  be  modeled  as  [  >]: 

y  =  k  *  x  +  e,  (3) 

where  k  is  the  PSF  (blur  kernel)  and  *  denotes  the  con¬ 
volution  operator.  The  problem  of  (blind)  deblurring  is  to 
estimate  both  the  latent  sharp  image  x  and  the  blur  kernel 
k  from  the  blurry  and  noisy  observation  y.  With  more  un¬ 
knowns  than  knowns,  this  is  a  typical  ill-posed  inverse  prob¬ 
lem,  thus  requiring  regularization  to  stabilize  the  solution: 

{x,k}  =  argmin||k*x  —  y|||  +  A  p(x)  +7£>(k),  (4) 

x.k 

where  p(x)  is  a  regularization  term  on  the  desired  im¬ 
age,  and  q( k)  regularizes  the  possible  blur  kernels,  typi¬ 
cally  an  ^2-norm  penalty  [2].  Most  of  the  current  restora¬ 
tion  methods  can  be  cast  into  such  a  regularization  frame¬ 
work  where  the  regularization  terms  based  on  image  prior 
are  crucial  for  obtaining  better  restoration  results  and  are 
related  somehow  with  the  sparse  property  of  natural  im¬ 
ages  [6,  14,  10,  19,  1,  5].  With  the  sparsity  prior  as  reg¬ 
ularization,  we  can  arrive  at  the  following  formulation: 

{x,k}  =  argmin||k*x-y||2  +  A||DTx||1  +7||k||2,  (5) 

x.k 

where  DT  is  some  sparse  transformation  (such  as  Wavelet, 
Curvelet,  among  others  [1])  or  sparsity  inducing  operator 
(such  as  handcrafted  derivative  filters  or  filters  learned  from 
training  images  [6,  14,  10,  13]).  When  D  is  orthonormal, 
we  have  a  =  DTx  as  the  transform  coefficients,  and  thus 
we  can  rewrite  Eqn.  (5)  as: 

{a,k}  =  argmin  ||k*  Da  -  y |||  +  A||a||i  +  'y||k|||.  (6) 

ex .  k 

To  achieve  better  sparsity  for  the  representation  a,  D  can  be 
generalized  to  be  non-orthogonal  and  over-complete,  by  ei¬ 
ther  combining  different  orthonormal  basis  or  learning  from 
the  data  [19].  In  this  paper,  we  model  D  as  the  training  data 
directly,  which  closely  relates  the  solution  a  with  recogni¬ 
tion  as  we  will  discuss  in  the  following. 

2.3.  Sparse  Representation  for  Recognition 

The  application  of  sparse  representation  for  classifica¬ 
tion  is  based  on  the  assumption  that  data  samples  belong¬ 
ing  to  the  same  class  live  in  the  same  subspace  of  a  much 


lower  dimension,  thus  a  new  test  sample  can  be  well  rep¬ 
resented  by  the  training  samples  of  the  same  class,  which 
leads  to  a  natural  sparse  representation  over  the  whole  train¬ 
ing  set.  Casting  the  recognition  problem  as  one  of  find¬ 
ing  a  sparse  representation  of  the  test  image  in  terms  of 
the  training  set  as  a  whole  up  to  some  sparse  errors  due 
to  occlusion,  Wright  et  al.  [18]  showed  that  such  a  sim¬ 
ple  sparse  representation  based  approach  is  robust  to  par¬ 
tial  occlusions  and  can  achieve  promising  recognition  ac¬ 
curacy  on  public  face  datasets.  This  idea  is  further  ex¬ 
tended  in  their  later  work  [15]  to  handle  face  misalignment. 
Formally,  given  a  set  of  training  samples  for  the  c-th  class 
Dc  =  [dCil,dc,2,...],  a  test  sample  x  from  class  c  can  be 
well  represented  by  Dc  with  coefficients  ac.  As  the  label 
for  x  is  unknown,  it  is  assumed  that  ac  can  be  recovered 
from  the  sparse  representation  of  x  in  terms  of  the  dictio¬ 
nary  constructed  from  training  samples  of  all  C  classes  by 


6l  =  argmin  ||a||i 

OL 

s.t.  || Da  -  x 1 1 2  <  e, 


(7) 


where  D  =  [Di,D2,---  ,Dc],a  =  [a7,aj,---  ,aJ]T. 
Then  the  label  for  the  test  sample  x  is  determined  as  the 
class  which  gives  the  minimum  reconstruction  error: 

c  =  argmin  ||D5c(a)  —  x|||  =  argmin  ||Dcac  —  x|||. 

c  c 


Sc(-)  is  an  indicator  function  keeping  the  elements  corre¬ 
sponding  to  the  c-th  class  while  setting  the  rest  to  be  zero. 


3.  Joint  Blind  Restoration  and  Recognition 
with  Sparse  Representation  Prior 

In  this  section,  we  present  our  joint  restoration  and 
recognition  framework  in  the  blind  situation,  i.e.,  no  a  pri¬ 
ori  information  on  the  image  degradation  process  about  the 
blurry  query  image  is  available,  and  develop  an  efficient 
minimization  algorithm  to  solve  the  problem. 

3.1.  Problem  Formulation 

In  conventional  recognition  works,  the  test  image  y  is 
often  assumed  to  be  captured  under  ideal  condition  with¬ 
out  any  degradation,  i.e.  y  =  x.  Some  simple  environ¬ 
mental  variations,  such  as  illumination  and  mild  misalign¬ 
ment,  can  be  fairly  well  handled  given  enough  training  sam¬ 
ples  [15].  In  reality,  however,  we  may  only  get  observation 
y  for  x  with  degradations,  e.g.,  blur  as  in  (3),  which  are 
hard  to  model  beforehand  and  can  bring  serious  problems 
to  the  recognition  task.  Therefore,  recognition  from  a  sin¬ 
gle  blurry  observation  is  a  very  challenging  task,  especially 
in  the  case  of  blind  situation  (dubbed  as  blind  recognition), 
i.e.,  no  a  priori  information  is  available  for  the  observation 
process.  As  far  as  we  know,  few  works  have  been  done  on 


this  challenging  blind  recognition  problem.  In  this  work,  we 
aim  to  address  the  task  of  blind  recognition  by  exploiting 
the  interactions  between  restoration  and  recognition  with 
the  sparse  representation  prior.  Formally,  given  the  blurry 
observation  y,  and  the  sharp  training  image  set  D,  we  want 
to  estimate  the  latent  sharp  image  x,  blur  kernel  k,  as  well 
as  the  class  label  c  simultaneously  by 

{x,  k,  c}  =  arg  min  E(x,  k,  c),  (8) 

x,k,c 

where 

E(x,k,c)  =  || k  *  x—  y || 2  +  J?||x  —  Da|||  +  A||a||i 

L  (9) 

+  T^|ei  *x|s  +7l|k|||. 

1  =  1 

We  explain  each  term  of  the  model  in  detail  as  follows. 

1 .  The  first  term  is  the  conventional  reconstruction  con¬ 
straint,  i.e.,  the  restored  image  should  be  consistent 
with  the  observation  with  respect  to  the  estimated 
degradation  model. 

2.  The  second  term  means  the  recovered  sharp  image  can 
be  well  represented  by  the  clean  training  set. 

3.  The  third  term  enforces  that  the  representation  of  the 
recovered  image  in  terms  of  the  training  set  should  be 
sparse.  In  other  words,  the  algorithm  favors  a  solution 
x  that  can  be  sparsely  represented  by  the  training  set. 
Meanwhile,  this  sparse  representation  also  recognizes 
the  identity  of  the  observation. 

4.  The  fourth  term  is  a  general  sparse  prior  for  natural 
images  using  sparse  exponential  of  the  responses  of 
derivative  filters  to  further  stabilize  the  solution,  where 
typically  0.5  <  s  <  0.8. 

5.  The  last  term  is  merely  a  £2 -norm  stable  regularization 
for  the  blur  kernel. 

The  basic  idea  of  the  model  is  that  the  restored  image  should 
have  a  sparse  representation  in  terms  of  the  training  images 
if  the  blur  kernel  is  correctly  estimated,  and  meanwhile  the 
sparse  representation  itself  identifies  the  observed  target. 
On  one  hand,  the  sparse  representation  prior  effectively  reg¬ 
ularizes  the  solution  space  of  the  possible  latent  images  and 
blur  kernels;  on  the  other  hand,  better  estimated  blur  ker¬ 
nel  will  promote  better  sparse  representations  for  recogni¬ 
tion.  As  shown  by  Eqn.  (9),  our  model  unifies  the  restora¬ 
tion  (6)  and  recognition  (7)  in  a  unified  framework  based 
on  the  sparse  representation  prior.  Note  that  the  proposed 
model  is  a  general  framework  which  can  handle  different 
kinds  of  image  degradations,  e.g .,  out-of-focus  blur,  various 
motion  blurs,  translation  misalignment,  and  etc.,  which  can 
be  modeled  by  a  linear  operator. 


3.2.  Optimization  Procedure 

The  proposed  model  (9)  involves  multiple  variables  and 
is  hard  to  minimize  directly.  We  adopt  the  alternating  min¬ 
imization  scheme  advocated  by  recent  sparse  optimization 
and  image  deblurring  works  [16,  14,  10,  2],  which  reduces 
the  original  problem  into  several  simpler  subproblems.  Fol¬ 
lowing  this  scheme,  we  address  the  subproblems  for  each 
of  the  optimization  variables  in  an  alternating  fashion  and 
present  an  overall  efficient  optimization  algorithm.  In  each 
step,  our  algorithm  reduces  the  objective  function  value,  and 
thus  will  converge  to  a  local  minima.  To  start,  we  initialize 
the  sparse  representation  a  as  that  recovered  from  y  with 
respect  to  D,  and  the  latent  sharp  image  x  as  Da. 

3.2.1  Blur  Kernel  Estimation:  Optimizing  for  k 

In  this  subproblem,  we  fix  all  other  variables  and  and  opti¬ 
mize  the  image  blur  kernel  k  by 

k  =  arg  min  ||x  *  k  —  y  |||  +  7||k|||.  (10) 

k 

This  is  a  least  square  problem  with  Tikhonov  regularization, 
which  leads  to  a  close-form  solution  for  k: 

k  =  p-1  (  Q  ^(y)  \ 

^(x)  o  JT(x)  +71/  ’ 

where  £F(-)  denotes  Fast  Fourier  Transform  (FFT), 
denotes  inverse  FFT,  £F(-)  denotes  the  complex  conjugate 
ofJ^(-),  and  “o”  denotes  element-wise  multiplication. 

3.2.2  Latent  Image  Recovery:  Optimizing  for  x 

Given  the  current  kernel  estimation  k  and  sparse  representa¬ 
tion  a,  we  want  to  update  the  estimation  for  the  latent  sharp 
image  x.  The  optimization  problem  (9)  becomes 

L 

x  =  arg  min  llx  *  k  —  yllo  +  77IIX  -  Dallo  +  r  le*  *  x|s. 

X  Z - / 

1  =  1 

This  optimization  problem  can  be  solved  efficiently  with 
variable  substitution  and  FFT  [16,  14,  10,  2].  Introducing 
new  auxiliary  variables  u i(l  E  1,  2,  •  •  •  ,L),  we  can  rewrite 
the  energy  function  in  (1 1)  as: 

E(x,  u)  =  ||x  *  k  -  y |||  +  j?||x  -  Da||| 

L  L  (12) 

+  T  Y,  lU/!S  +  ^  Y,  HU;  “  e*  *  Xll2’ 

1  =  1  1  =  1 

which  can  be  divided  into  two  sub-problems:  x- subproblem 
and  u-subproblem.  In  the  x-subproblem,  the  energy  func¬ 
tion  to  be  minimized  becomes 

L 

E(x)  =  ||x*k-y||!+?7||x-Da||^+/?y^||ej*x-uj||! 

i= 1 


(11) 


which  can  be  solved  efficiently  using  FFT  as: 


-  =  jr-1  f  ^(k)  °  .T(y)  +  ^(Dtt)  ±/3Eti  -F(e»)  °  ^(u,)  \ 

V  ^(k)  o  ^(k)  +  r?I  +  0  Zti  HeiJ  °  H*i)  J 

In  the  u- subproblem,  ui  can  be  estimated  by  solving  the 
following  problem  given  fixed  x: 

u;  =  arg minr|u;|s  +/3||u;  -  et  *  x|||,  (13) 

U  l 

which  can  be  solved  efficiently  over  each  dimension  sep¬ 
arately  [10].  In  practice,  we  use  first-order  derivative  fil¬ 
ters  {ei  =  [1,  —  1],  e2  =  [1,  —  1] T }  and  set  s  =  0.5  as  [10]. 
We  follow  the  multi- scale  estimation  scheme  for  stable  es¬ 
timations  of  the  blur  kernel  k  and  latent  sharp  image  x  as 
in  [6,  14,  2].  Conventional  schemes  such  as  structure  pre¬ 
diction  have  also  been  incorporated  into  optimization  [2] . 

3.2.3  Sparse  Projection:  Optimizing  for  a 

With  the  recovered  kernel  k  and  sharp  training  set  D,  we 
can  generate  the  corresponding  blurry  dictionary  Db  via 

Db  =  D  *  k,  (14) 

where  the  convolution  *  is  performed  on  each  column  of 
D  with  k.  Then  the  sparse  representation  vector  a  can  be 
updated  by 

a  =  arg  min  HDbC*  -  y III  +  A||a||i,  (15) 

OL 

from  which  the  classification  decision  is  made  using 

c  =  argmin  ||Db<5c(d)  -  y|||.  (16) 

C 

We  do  not  use  the  deblurred  image  and  the  sharp  training 
set  to  compute  the  sparse  representation  a  because  the  de¬ 
blurring  process  may  introduce  artifacts  which  is  disadvan¬ 
tageous  for  recovery  and  recognition.  Based  on  compres¬ 
sive  sensing  theory,  we  can  recover  the  sparse  represen¬ 
tation  using  the  blurry  observation  y  and  the  blurry  dic¬ 
tionary  Db  and  thus  circumvent  the  above  problem.  The 
overall  algorithm  optimizes  over  blur  kernel  k,  latent  sharp 
image  x,  sparse  representation  a  and  class  label  c  alterna¬ 
tively.  Algorithm  1  describes  the  procedures  of  our  joint 
blind  restoration  and  recognition  algorithm. 

4.  Experiments  and  Results 

In  this  section,  we  present  several  experiments  to  demon¬ 
strate  the  effectiveness  of  the  proposed  JRR  method  in 
terms  of  both  restoration  accuracy  and  recognition  accu¬ 
racy.  The  Extended  Yale  B  [7]  (48  x  42)  and  CMU  Multi- 
PIE  [8] (80  x  60)  datasets  are  used  for  evaluation  in  this 
work.  The  Extended  Yale  B  dataset  contains  38  individuals, 


Algorithm  1:  Joint  Blind  Image  Restoration  and 
Recognition  with  Sparse  Representation  Prior. 

Input:  a  blurry  image  y,  training  image  set  D 
Output:  estimated  blur  kernel  k,  restored  image  x, 
and  the  class  label  c 

Initialization:  sparse  vector  a  recovered  from  y  in 
terms  of  D,  and  x  =  Da. 
for  t  =  1,  2,  •  •  •  ,  T  do 

Kernel  Estimation:  update  kernel  k  by 
minimizing  Eqn.(10); 

Image  Estimation:  update  the  latent  image  x 
estimation  via  minimizing  Eqn.(l  1); 

Sparse  Projection:  recovering  the  sparse 
coefficients  a  by  minimizing  Eqn.(15); 
Classification:  estimate  the  class  label  c  from 
_  Eqn.(16). 


each  with  64  near  frontal  view  images  under  different  illu¬ 
minations.  For  CMU  Multi-PIE  dataset,  We  use  the  frontal 
images  with  neutral  expression  under  varying  illuminations 
from  session  1  for  computational  considerations. 

For  restoration,  we  compare  our  algorithm  with  the  fast 
deblurring  method  in  [2],  one  of  the  state-of-the-art  blind 
deblurring  algorithms.  Root  Mean  Square  Error  (RMSE) 
is  employed  to  compare  the  estimation  accuracy  for  both 
the  blur  kernel  and  the  restored  image.  For  classification, 
we  compare  our  JRR  algorithm  with  the  following  meth¬ 
ods:  (1)  SVM:  classification  with  linear  SVM  trained  on 
the  sharp  training  set;  (2)  SRC:  directly  feed  the  blurry  ob¬ 
servation  into  the  sparse  representation  based  classification 
algorithm  [18];  and  (3)  SRC-B:  first  estimate  the  kernel  and 
then  generate  a  blurred  training  set  for  SRC.1 

4.1.  An  Illustrative  Example 

We  illustrate  the  proposed  method  with  a  simple  example 
in  Figure  2.  Given  a  blurry  observation,  we  jointly  recover 
the  blur  kernel,  the  latent  sharp  image,  and  the  class  label 
in  an  iterative  way.  Figure  2  shows  that,  as  the  optimiza¬ 
tion  iteration  increases,  the  latent  representation  becomes 
sparser  and  sparser  as  indicated  by  the  increase  of  Sparsity 
Concentration  Index  (SCI)  measure2,  which  implies  that  the 
underlying  class  label  of  the  test  image  can  be  determined 
with  increasing  confidence.  At  the  same  time,  the  restored 
image  resembles  more  and  more  to  the  ground  truth  as  indi¬ 
cated  by  the  decrease  of  the  restoration  error,  which  means 

Another  approach  is  first  to  deblur  the  test  image  and  then  use  the  de¬ 
blurred  image  for  recognition.  Empirically,  we  observe  that  this  method 
may  perform  even  worse  than  using  the  original  blurry  image  directly, 
mainly  due  to  the  artifacts  induced  by  the  deblurring  step  (Figure  4),  and 
thus  we  do  not  compare  with  this  method  in  the  sequel. 

2 SCI  is  defined  as  SCI(x)  =  c'maXi  ^»4*)|1/llxll1~1,  where  C  is 
the  total  number  of  classes  [18]. 
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Figure  2.  The  joint  blind  restoration  and  recognition  optimization 
process  for  5  iterations.  Top  row,  left  to  right:  ground-truth  sharp 
image,  blurry  test  image,  and  the  restored  images  from  iteration 
1  to  5.  The  ground  truth  and  estimated  PSFs  are  framed  in  red 
and  green  borders  respectively.  Bottom  row,  left:  sparsity  of  the 
recovered  sparse  coefficients  in  terms  of  SCI;  right:  restoration 
errors  in  terms  of  RMSE. 


that  the  estimated  blur  kernel  gets  more  and  more  accurate. 
Actually,  in  the  first  iteration,  the  blurry  input  is  wrongly  as¬ 
signed  with  class  label  of  subject  4,  while  the  ground  truth 
label  is  subject  1.  After  the  second  iteration,  with  better  re¬ 
stored  image  and  kernel,  the  algorithm  can  correctly  finds 
the  true  class  label.  This  illustrates  that  our  approach  can 
effectively  regularize  the  ill-posed  blind  image  restoration 
in  pursuit  of  the  sparsest  representation  for  recognition.  On 
one  hand,  a  better  recovered  image  will  have  a  more  mean¬ 
ingful  sparse  representation  for  recognition;  on  the  other 
hand,  the  updated  sparse  representation,  tightly  connected 
with  recognition,  will  provides  a  powerful  regularization  for 
the  followed  blind  image  restoration.  In  practice,  we  notice 
that  the  joint  optimization  proces  converges  very  quickly, 
typically  in  no  more  than  4  iterations.  Therefore,  we  fix  the 
iteration  number  as  4  in  all  the  following  experiments. 

4.2.  Joint  Blind  Image  Restoration  and  Recognition 

In  this  subsection,  we  conduct  experiments  on  joint  im¬ 
age  restoration  and  recognition  for  face  images  under  var¬ 
ious  blind  degradation  settings.  In  our  JRR  algorithm,  the 
tasks  of  image  restoration  and  recognition  are  tightly  cou¬ 
pled.  However,  to  facilitate  the  comparisons  with  conven¬ 
tional  restoration  and  recognition  approaches  respectively, 
we  will  present  the  results  for  restoration  and  recognition 
separately  in  the  sequel. 

4.2.1  Blind  Image  Restoration 

We  first  quantitatively  evaluate  the  kernel  estimation  and 
image  restoration  accuracy  on  Extended  Yale  B  face  dataset. 
To  be  consistent  with  the  recognition  evaluation,  we  ran¬ 
domly  select  half  of  the  images  for  each  subject  as  the  train¬ 
ing  set.  We  then  randomly  choose  10  images  from  the  rest 
as  our  testing  examples  for  restoration.  For  each  test  im- 


Figure  3.  Restoration  results  comparison  in  terms  of  RMSE. 
(a)  kernel  estimation;  (b)  image  estimation. 


age,  we  generate  its  blurry  images  using  the  8  realistic  non- 
parametric  complex  blur  kernels  proposed  by  Levin  et  al. 
in  [11],  shown  in  the  first  row  of  Table  2.  Given  a  blurry 
input,  our  JRR  algorithm  estimates  the  unknown  blur  ker¬ 
nel  without  any  prior  knowledge  and  recovers  the  underly¬ 
ing  sharp  latent  image,  which  are  then  evaluated  in  terms  of 
RMSE  with  respect  to  the  ground  truth.  We  compare  our 
JRR  algorithm  with  the  fast  deblurring  method  in  [2]. 

Figure  3  (a)  shows  the  average  RMSEs  for  each  es¬ 
timated  kernels  given  the  blurry  inputs,  where  our  JRR 
method  improves  the  kernel  estimation  accuracy  substan¬ 
tially  compared  with  the  fast  deblurring  algorithm.  This  can 
be  explained  by  the  fact  that  face  images  are  lack  of  strong 
edge  structures,  especially  in  the  case  of  blurry  observa¬ 
tion,  which  presents  a  great  challenge  to  the  existing  blind 
deblurring  methods.  With  the  sparse  representation  prior, 
however,  our  method  demonstrates  much  more  robustness 
in  estimating  the  complex  blur  kernels.  Figure  3  (b)  shows 
the  comparisons  of  average  restoration  RMSEs  for  the  10 
images  under  the  8  complex  kernels.  Due  to  the  incorpo¬ 
ration  of  the  sparse  representation  prior,  our  algorithm  im¬ 
proves  the  restoration  accuracy  significantly  over  the  fast 
deblurring  method  for  all  the  test  images.  By  exploiting 
the  sparse  representation  prior,  the  restored  image  has  more 
details  and  less  artifacts  (Figure  4),  implying  a  more  accu¬ 
rate  sparse  representation,  thus  facilitating  recognition,  as 
shown  in  the  following. 

4.2.2  Blind  Image  Recognition 

For  recognition,  we  first  evaluate  the  recognition  perfor¬ 
mance  of  the  proposed  method  on  Extended  Yale  B  dataset. 
We  randomly  select  half  of  the  images  for  each  subject  for 
training,  and  use  the  rest  for  testing.  To  generate  the  blurry 
inputs,  we  also  add  two  more  simple  parametric  blur  ker¬ 
nels,  i.e.,  linear  motion  kernel  and  Gaussian  blur  kernel,  in 
addition  to  the  eight  complex  blur  kernels  [11].  For  each 
blur  kernel,  we  generate  a  set  of  blurred  testing  images, 
leading  to  in  total  10  testing  sets.  Table  1  summarizes  the 
recognition  results  for  a  simple  motion  blur  (10  pixel-length 
with  45  degree)  and  a  Gaussian  kernel  (with  standard  de¬ 
viation  3),  where  the  kernel  size  is  9  x  9.  Our  JRR  al¬ 
gorithm  outperforms  SRC  remarkably,  while  slightly  better 


Table  1.  Recognition  rate  (%)  on  Extend  Yale  B  under  simple  para¬ 
metric  blur  kernels. 


Kernel  Type 

SVM 

SRC 

SRC-B 

JRR 

Motion 

40.0 

68.7 

85.3 

86.0 

Gaussian 

29.9 

57.7 

84.8 

84.8 

Table  2.  Recognition  accuracy  (%)  on  Extend  Yale  B  set  under 
complex  non-parametric  blur  kernels. 


Kernels 

Q 

a 

□ 

m 

D 

H 

m 

B 

Sizes 

19 

17 

15 

27 

13 

21 

23 

23 

SVM 

45.9 

27.2 

45.8 

11.2 

43.5 

48.4 

20.9 

16.9 

SRC 

79.8 

54.1 

74.9 

21.3 

65.5 

83.5 

36.6 

30.3 

SRC-B 

80.6 

79.3 

73.4 

33.0 

70.1 

76.8 

51.9 

51.9 

JRR 

86.2 

79.3 

85.7 

43.1 

81.9 

86.4 

64.7 

54.8 

than  SRC-B.  This  is  because  the  conventional  blind  deblur¬ 
ring  method  can  estimate  the  blur  kernel  reasonably  well  in 
simple  blur  model  case.  Table  2  presents  the  recognition  re¬ 
sults  under  the  complex  non-parametric  blur  kernels.  In  this 
case,  conventional  blur  kernel  estimation  methods  fail  eas¬ 
ily  due  to  the  complexity  of  the  kernels  and  lack  of  strong 
structures  in  the  face  images,  and  as  a  result,  the  recognition 
results  of  our  JRR  algorithm  outperform  those  of  SRC-B 
and  SRC  by  a  large  margin  in  most  cases. 

We  then  evaluate  our  algorithm  on  Multi-PIE  [8]  dataset, 
with  15  images  from  each  subject  of  Session  1  for  training 
and  the  rest  of  Session  1  for  testing.  Due  to  space  limita¬ 
tion,  we  only  report  the  results  for  the  third  complex  kernel 
as  shown  in  Table  3.  Again,  our  algorithm  performs  much 
better  than  other  methods.  Note  that  as  the  conventional 
kernel  estimation  method  is  not  robust  enough  in  this  case, 
SRC-B  performs  even  worse  than  SRC.  We  further  evaluate 
our  algorithm  in  a  more  realistic  scenario,  where  the  blur 
kernel  for  generating  a  blurry  image  is  not  fixed  but  ran¬ 
domly  chosen  from  { Linear  Motion  kernel ,  Gaussian  ker¬ 
nel,  Nonparametric  Complex  kernel,  Delta  (no  blur )}.  The 
recognition  results  for  this  case  are  shown  in  Table  4,  and 
our  proposed  JRR  method  outperforms  all  the  other  meth¬ 
ods  with  large  margins  on  both  datasets. 

Finally,  to  visually  demonstrate  the  effectiveness  of  our 
JRR  algorithm,  we  compare  the  estimated  kernels,  de- 
blurred  images,  and  the  top- 10  selected  atoms  with  the 
largest  absolute  coefficients  from  sparse  representations  un¬ 
der  two  different  kernels,  shown  in  Figure  4.  Top  row  shows 
the  results  of  SRC;  middle  row  shows  the  results  of  con¬ 
ventional  blind  deblur  followed  by  SRC;  and  bottom  row 
shows  our  results.  The  blur  kernels  framed  in  red  denote 
the  ground  truth  kernels,  and  those  framed  in  green  are  the 
estimated  kernels.  In  both  cases,  our  algorithm  can  accu¬ 
rately  estimate  the  unknown  blur  kernels  and  can  output 
sharp  images  close  to  the  ground  truth,  while  the  fast  de¬ 
blurring  method  is  not  robust  and  fails  drastically  for  the 


Table  3.  Recognition  rate  (%)  on  Multi-PIE  with  the  third  complex 
blur  kernel. 


Algorithm 

SVM  SRC  SRC-B 

JRR 

Accuracy 

84.8  85.2  79.1 

91.4 

Table  4.  Recognition  rate  (%)  with  randomly  blur  kernels  on  both 
Extended  Yale  B  and  Multi-PIE. 


Algorithm 

SVM 

SRC 

SRC-B 

JRR 

Extended  Yale  B 

57.0 

68.8 

66.3 

73.7 

Multi-PIE 

49.4 

53.6 

54.9 

61.3 

complex  kernel.  To  the  right  of  each  restored  image,  top- 10 
atoms  from  the  sharp  training  set  are  selected  by  the  largest 
absolute  sparse  representation  coefficients,  where  red  num¬ 
bers  denote  atoms  chosen  from  the  same  class  (correct)  and 
blue  numbers  denote  otherwise  (wrong).  It  is  clear  that  our 
JRR  algorithm  can  select  more  atoms  from  the  same  class 
with  more  concentrated  large  coefficients,  indicating  better 
recognition  ability. 

However,  a  challenging  situation  is  when  the  blurry  test 
image  suffers  from  extreme  illuminations,  as  in  Figure  5, 
where  little  information  about  the  facial  structures  is  kept 
for  deblurring.  In  this  case,  the  deblurring  task  becomes 
extremely  challenging  and  the  blur  kernel  may  not  be  cor¬ 
rectly  estimated  even  with  our  algorithm,  which  will  lead  to 
incorrect  classification  decisions.  In  both  datasets  we  use, 
there  are  in  fact  a  notable  amount  of  such  kind  of  images, 
which  pose  great  challenges  to  the  task  of  blind  recognition 
on  these  datasets.  Yet,  with  the  sparse  representation  prior, 
the  deblurring  result  of  our  algorithm  looks  much  more  rea¬ 
sonable  than  that  of  the  fast  deblurring  method. 

5.  Conclusion  and  Future  Work 

We  propose  in  this  paper  a  joint  restoration  and  recog¬ 
nition  method  with  the  sparse  representation  prior,  and 
demonstrate  its  application  on  face  recognition  from  a  sin¬ 
gle  blurry  image.  By  combining  these  two  interactive  tasks, 
our  algorithm  demonstrates  significant  improvements  over 
that  of  treating  them  separately.  In  the  current  model,  mild 
translation  misalignment  between  test  and  training  images 
can  be  captured  and  compensated  by  the  blur  kernel.  For 
future  work,  more  complex  alignment  models,  e.g .,  affine 
transformation,  can  be  incorporated  into  our  framework  to 
further  handle  more  challenging  misalignment  between  the 
blurry  test  image  and  sharp  training  images  with  techniques 
similar  to  [15]  and  [  0].  Moreover,  using  learned  dictionary 
rather  than  the  training  images  directly  in  our  model  is  also 
interesting  and  worthy  of  investigation  in  the  future. 
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Figure  4.  Image  restoration  results  under  (a)  parametric  PSF  (Gaussian  blur)  and  (b)  realistic  non-parametric  PSF  (27  x  27  non-parametric 
motion  blur).  Top:  SRC;  Middle:  conventional  deblur  +  SRC;  Bottom:  JRR.  The  PSF  kernels  framed  in  red  denote  the  ground-truth  kernels 
while  those  in  green  are  estimated  kernels.  Atoms  corresponding  to  the  top- 10  largest  absolute  coefficient  values  are  shown  together  with 
the  absolute  values  for  each  method,  with  red  indicating  atoms  selected  from  the  same  class  as  the  test  image. 


Figure  5.  Failure  case  analysis,  (a)  ground  truth  image  and  kernel;  (b)  blurry  input;  estimated  image  and  kernel  using  (c)  conventional 
deblurring  method  [2]  and  (d)  the  proposed  JRR  method;  (e)  top- 10  selected  atoms  with  the  JRR  method.  Kernel  estimation  is  very 
challenging  due  to  the  extreme  illumination. 


References 

[1]  J.-F.  Cai,  H.  Ji,  C.  Liu,  and  Z.  Shen.  Blind  motion  deblurring  from  a 
single  image  using  sparse  approximation.  In  CVPR,  2009.  2,  3 

[2]  S.  Cho  and  S.  Lee.  Fast  motion  deblurring.  In  SIGGRAPH  ASIA, 
2009.  2,  3,  4,  5,  6,  8 

[3]  M.  Das  Gupta,  S.  Rajaram,  N.  Petrovic,  and  T.  S.  Huang.  Restoration 
and  recognition  in  a  loop.  In  CVPR,  2005.  2 

[4]  D.  L.  Donoho.  For  most  large  underdetermined  systems  of  linear 
equations  the  minimal  G-norm  solution  is  also  the  sparsest  solution. 
Comm.  Pure  Appl.  Math,  59:797-829,  2004.  2 

[5]  M.  Elad,  M.  A.  T.  Figueiredo,  and  Y.  Ma.  On  the  role  of  sparse 
and  redundant  representations  in  image  processing.  Proc.  of  IEEE, 
98(6):972-982,  2010.  2,  3 

[6]  R.  Fergus,  B.  Singh,  A.  Hertzmann,  S.  T.  Roweis,  and  W.  T.  Freeman. 
Removing  camera  shake  from  a  single  photograph.  In  SIGGRAPH, 
2006.  1,2,  3,  5 

[7]  A.  Georghiades,  P.  Belhumeur,  and  D.  Kriegman.  From  few  to  many: 
illumination  cone  models  for  face  recognition  under  variable  lighting 
and  pose.  IEEE  TPAMI,  23(6):643-660,  2001.  5 

[8]  R.  Gross,  I.  Matthews,  J.  Cohn,  T.  Kanade,  and  S.  Baker.  Multi-PIE. 
In  IEEE  Inti.  Conf.  Automatic  Face  and  Gesture  Recog.,  2008.  5,  7 

[9]  P.  H.  Hennings- Yeomans,  S.  Baker,  and  B.  V.  Kumar.  Simultane¬ 
ous  super-resolution  and  feature  extraction  for  recognition  of  low- 
resolution  faces.  In  CVPR,  2008.  2 


[10]  D.  Krishnan  and  R.  Fergus.  Fast  image  deconvolution  using  hyper- 
laplacian  priors.  In  NIPS,  2009.  3,  4,  5 

[11]  A.  Levin,  Y.  Weiss,  F.  Durand,  and  W.  Freeman.  Understanding  and 
evaluating  blind  deconvolution  algorithms.  In  CVPR,  2009.  2,  6 

[12]  M.  Nishiyama,  H.  Takeshima,  J.  Shotton,  T.  Kozakaya,  and  O.  Ya- 
maguchi.  Facial  deblur  inference  to  improve  recognition  of  blurred 
faces.  In  CVPR,  2009.  2 

[13]  S.  Roth  and  M.  J.  Black.  Fields  of  experts:  A  framework  for  learning 
image  priors.  In  CVPR,  2005.  3 

[14]  Q.  Shan,  J.  Jia,  and  A.  Agarwala.  High-quality  motion  deblurring 
from  a  single  image.  In  SIGGRAPH,  2008.  1,  2,  3,  4,  5 

[15]  A.  Wagner,  J.  Wright,  A.  Ganesh,  Z.  Zhou,  and  Y.  Ma.  Towards  a 
practical  face  recognition  system:  Robust  registration  and  illumina¬ 
tion  by  sparse  representation.  In  CVPR,  2009.  3,  7 

[16]  Y.  Wang,  J.  Yang,  W.  Yin,  and  Y.  Zhang.  A  new  alternating  mini¬ 
mization  algorithm  for  total  variation  image  reconstruction.  SIAM  J. 
Img.  Sci.,  l(3):248-272,  2008.  4 

[17]  J.  Wright,  Y.  Ma,  J.  Mairal,  G.  Sapiro,  T.  S.  Huang,  and  S.  Yan. 
Sparse  representation  for  computer  vision  and  pattern  recognition. 
Proc.  of  IEEE,  98(6)4031-1044,  2010.  2 

[18]  J.  Wright,  A.  Yang,  A.  Ganesh,  S.  Sastry,  and  Y.  Ma.  Robust  face 
recognition  via  sparse  representation.  IEEE  TPAMI,  2009.  2,  3,  5 

[19]  J.  Yang,  J.  Wright,  T.  Huang,  and  Y.  Ma.  Image  super  resolution  as 
sparse  representation  of  raw  image  patches.  In  CVPR,  2008.  1,  2,  3 

[20]  L.  Yuan,  J.  Sun,  L.  Quan,  and  H.-Y.  Shum.  Blurred/non-blurred  im¬ 
age  alignment  using  sparseness  prior.  In  ICCV,  2007.  7 


